Goto

Collaborating Authors

 empirical comparison




226d1f15ecd35f784d2a20c3ecf56d7f-Reviews.html

Neural Information Processing Systems

R2/R3: 'To better judge the performance of the proposed method, it would be useful to include comparisons against existing approaches to the problem such as [Meeds et al., NIPS 2007] and [Bittdorf et al., NIPS 2012]' We have meanwhile conducted an experimental comparison to the LP approach of Bittdorf et al. in the separable setting with binary T. We have found that our approach is more robust to noise, and we plan to add these results in the final version. The model of Meeds et al. involves two binary factors in a three-factor factorization and is hence different from our factorization model. A comparison can still be performed when running our approach in a two-step manner. Provided that code can be obtained from Meeds et al, such a comparison would be included in the final version. Please note that the paper already contains a comparison to two methods based on alternating optimization (a standard approach to NMF), adapted to our specific factorization problem.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper shows that the skip-gram model of Mikolov et al, when trained with their negative sampling approach can be understood as a weighted matrix factorization of a word-context matrix with cells weighted by point wise mutual information (PMI), which has long been empirically known to be a useful way of constructing word-context matrices for learning semantic representations of words. This is an important result since it provides a link between two (apparently) very different methods for constructing word embeddings that empirically performed well, but seemed on the surface to have nothing to do with each other. Using this insight, the authors then propose a new matrix construction and finds it performs very well on standard tasks. The paper is mostly admirably clear (see below for a few suggestions on where citations could be added to make the relevant related work clear) and very nice contribution to have to explain what is going on in these neural language model embedding models.




the paper be accepted

Neural Information Processing Systems

Regarding our proof techniques, the proof in Thm. 1 for NTK with two layers and bias borrows techniques from [6]. Our proof technique for deep networks uses the algebra of RKHSs and is therefore novel in this context. Thm. 2 derives bounds that result from the relation between the Fourier expansion of the Laplace kernel in NTK (established in Thm. 4) and identifying the spaces fixed under the appropriate integral transform. "why they need additional parameters a, b, c." We note that analogously NTK becomes sharper for deeper networks.




Fine-tuning for Better Few Shot Prompting: An Empirical Comparison for Short Answer Grading

Walsh, Joel, Mamidanna, Siddarth, Nye, Benjamin, Core, Mark, Auerbach, Daniel

arXiv.org Artificial Intelligence

Research to improve Automated Short Answer Grading has recently focused on Large Language Models (LLMs) with prompt engineering and no- or few-shot prompting to achieve best results. This is in contrast to the fine-tuning approach, which has historically required large-scale compute clusters inaccessible to most users. New closed-model approaches such as OpenAI's fine-tuning service promise results with as few as 100 examples, while methods using open weights such as quantized low-rank adaptive (QLORA) can be used to fine-tune models on consumer GPUs. We evaluate both of these fine-tuning methods, measuring their interaction with few-shot prompting for automated short answer grading (ASAG) with structured (JSON) outputs. Our results show that finetuning with small amounts of data has limited utility for Llama open-weight models, but that fine-tuning methods can outperform few-shot baseline instruction-tuned LLMs for OpenAI's closed models. While our evaluation set is limited, we find some evidence that the observed benefits of finetuning may be impacted by the domain subject matter. Lastly, we observed dramatic improvement with the LLama 3.1 8B-Instruct open-weight model by seeding the initial training examples with a significant amount of cheaply generated synthetic training data.